Clustering with mixtures of log-concave distributions
نویسندگان
چکیده
The EM algorithm is a popular tool for clustering observations via a parametric mixture model. Two disadvantages of this approach are that its success depends on the appropriateness of the assumed parametric model, and that each model requires a different implementation of the EM algorithm based on model-specific theoretical derivations. We show how this algorithm can be extended to work with the flexible, nonparametric class of log-concave component distributions. The advantages of the resulting algorithm are: first, it is not restricted to parametric models, so it no longer requires to specify such a model and its results are no longer sensitive to a misspecification thereof. Second, only one implementation of the algorithm is necessary. Furthermore, simulation studies based on the normal mixture model show that there seems to be no noticeable performance penalty of this more general nonparametric algorithm vis-a-vis the parametric EM algorithm in the special case where the assumed parametric model is indeed correct. © 2007 Elsevier B.V. All rights reserved. MSC: 62G07; 62G20; 62G35
منابع مشابه
On Spectral Learning of Mixtures of Distributions
We consider the problem of learning mixtures of distributions via spectral methods and derive a tight characterization of when such methods are useful. Specifically, given a mixture-sample, let μi, Ci, wi denote the empirical mean, covariance matrix, and mixing weight of the i-th component. We prove that a very simple algorithm, namely spectral projection followed by single-linkage clustering, ...
متن کاملNon-parametric log-concave mixtures
Finite mixtures of parametric distributions are often used to model data of which it is known or suspected that there are subpopulations. Instead of a parametric model, a penalized likelihood smoothing algorithm is developed. The penalty is chosen to favor a log-concave result. The standard EM algorithm (“split and fit”) can be used. Theoretical results and applications are presented. © 2006 El...
متن کاملLearning mixtures of structured distributions over discrete domains
Let C be a class of probability distributions over the discrete domain [n] = {1, . . . , n}. We show that if C satisfies a rather general condition – essentially, that each distribution in C can be well-approximated by a variable-width histogram with few bins – then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of k unknow...
متن کاملTesting Identity of Structured Distributions
We study the question of identity testing for structured distributions. More precisely, given samples from a structured distribution q over [n] and an explicit distribution p over [n], we wish to distinguish whether q = p versus q is at least ε-far from p, in L1 distance. In this work, we present a unified approach that yields new, simple testers, with sample complexity that is information-theo...
متن کاملA Note on Log-Concavity
This is a small observation concerning scale mixtures and their log-concavity. A function f(x) ≥ 0, x ∈ Rn is called log-concave if f (λx + (1− λ)y) ≥ f(x)f(y) (1) for all x,y ∈ Rn, λ ∈ [0, 1]. Log-concavity is important in applied Bayesian Statistics, since a distribution with a log-concave density is easy to treat with many different approximate inference techniques. For example, log-concavit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Statistics & Data Analysis
دوره 51 شماره
صفحات -
تاریخ انتشار 2007